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In the presence of multiscale dynamics in a reaction network, direct simulation methods become 
inefficient as they can only advance the system on the smallest scale. This work presents stochastic 
averaging techniques to accelerate computations for obtaining estimates of expected values and 
sensitivities with respect to the steady state distribution. A two-time-scale formulation is used to 
establish bounds on the bias induced by the averaging method. Further, this formulation provides 
a framework to create an accelerated ‘averaged’ version of most single-scale sensitivity estimation 
methods. In particular, we propose a new lower-variance ergodic likelihood ratio method for steady 
state estimation and show how one can adapt it to accelerate simulations of multiscale systems. 
Lastly, we develop an adaptive “batch-means” stopping rule for determining when to terminate the 
micro-equilibration process. 
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I. INTRODUCTION 

Stochastic simulations have been an essential tool 
in analyzing reaction networks encountered in biol¬ 
ogy, catalysis, and materials growth. However, it is 
commonplace for reaction networks to exhibit a large 
disparity in time scales. These multi-scale stochastic 
reaction networks can impose an enormous compu¬ 
tational burden in order to simulate them exactly. 
Exact techniques require computation of every reac¬ 
tion at the fastest time-scale, resulting in an expo¬ 
nentially growing load to observe dynamics on the 
slowest time-scale. Many works have attempted to 
develop approximate algorithms which allow faster 
computation with minimal loss of accuracy 

One approach, which we refer to as Stochastic 
Averaging, takes its inspiration from classical sin¬ 
gular perturbation theory of ordinary differential 
equations®^^^. The idea is that the fast dynam¬ 
ics come to quasi-equilibrium before the slow dy¬ 
namics take effect, hence one may model the slow 
time scale dynamics with their averages against the 
steady-state distribution of the fast dynamics. By 
estimating the steady-state expectations of the slow 
propensities, one can then jump the system ahead to 
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the next slow reaction and advance the time clock on 
the slow scale (skipping over needless computations 
of fast reactions). 

In addition, one often desires the sensitivities 
of the system Sf{9i) = ^Ee{/(A(t))} with re¬ 
spect to the reaction parameters 0i. The sensitiv¬ 
ities give important insight into the system, indi¬ 
cating directions for gradient-descent type optimiza¬ 
tion as well as determining bounds for quantifying 
the uncertainty^®. Current techniques for estimating 
the sensitivities have large variance, requiring many 
more samples than those for estimating Eg {/(A)} 
alone^®^^®. Thus computing sensitivities of multi¬ 
scale systems using single-scale techniques is often a 
computationally intractable problem. 

In this work, we use results from Two-Time-Scale 
(TTS) Markov chains^° to show the error of stochas¬ 
tic averaging algorithms is inverse to the scale dis¬ 
parity in the system. As opposed to the previous 
approaches of transforming the system variables into 
auxiliary fast and slow variables^^’^^, we partition 
the underlying (discrete) state space and derive a 
singular perturbation expansion of the correspond¬ 
ing probability measure. The first order term can 
then be identified from computables of the aver¬ 
aged process, leading to a rigorous theoretical frame¬ 
work for applying singular perturbation averaging 
for stochastic systems. 

Furthermore, this new formulation allows one to 
identify a macroscopic “averaged” reaction network 
on a reduced state space whose time-steps are on 
the macro (slow) time-scale. Thus, it provides a 



framework for applying single-scale sensitivity anal¬ 
ysis techniques to the multi-scale system. Pre¬ 
vious works have exploited similar model reduc¬ 
tion techniques to estimate sensitivities via finite 
differences^"^ or a “truncated” version of a likeli¬ 
hood ratio estimator^^. This work develops an accel¬ 
erated “Two-Time-Scale” version of the Likelihood 
Ratio (Girsanov Transform) method^®’^^“^^ for esti¬ 
mating system sensitivities of the multiscale system. 
The TTS-LR method computes sensitivity reweight¬ 
ing coefficients of the macro (reduced-state) pro¬ 
cess using a representation in terms of the steady- 
state sensitivities of the micro (fast) process. These 
micro-level sensitivities can in turn be computed on¬ 
line during the micro-equilibriation process. To this 
end, we propose a new lower-variance “Ergodic Like¬ 
lihood Ratio” estimator for approximating steady- 
state sensitivities of single and multi-scale systems. 

The outline of the remainder of the paper is as fol¬ 
lows: Section II gives the theoretical basis of the pa¬ 
per. The Two-Time-Scale formulation is presented 
and error bounds are established. Section III then 
uses the TTS framework for the purpose of sensi¬ 
tivity analysis. A new ergodic likelihood ratio esti¬ 
mator is developed for single-scale steady-state sen¬ 
sitivity analysis, and is then adapted to the mul¬ 
tiscale system. Section IV develops a batch-means 
stopping rule for determining when the micro-scale 
system has come to equilibrium. Numerical results 
are presented in Section V supporting the effective¬ 
ness of the methods presented. Concluding remarks 
are given in Section VI, and proofs of theorems are 
relegated to the Appendix. 


II. FORMULATION 

A. Markov Chain Model of Reaction Networks 

We briefly review the Markov chain model of 
reaction networks. While our motivation stems 
from chemical reaction networks, we note that much 
of the formulation carries over to general Markov 
chains on integer lattices. 

Suppose we have d species described by X{t) = 
[Xi{t),X 2 {t), ... ,Xd{t)\ G A1 C and M reac¬ 
tions ri, r 2 ,..., tm- In stochastic reaction networks, 
one views X[t) as a continuous-time Markov chain 
(CTMC) in the state space Xi. When reaction r 
fires at time t, the state is updated by the stoi¬ 
chiometric vector (^r so that X{t) = X(t—) -\- C,r- 
Given the set of reaction parameters 0 = [ 6 * 1 , 02 , ■ • ■ 
one characterizes the probabilistic evolution of X (t) 


by the propensity functions (intensity functions) 
Xr{x]6). The propensity functions are such that, 
given X(t) = a;, the probability of one or more fir¬ 
ing of reaction r during time {t, t-\-h] is Ar(a;; 6)h -\- 
o{h) as h —> 0; i.e. Xr{x;6) is the instantaneous 
rate/probability of reaction r firing. 

A common model for the propensities functions is 
that of mass-action kinetics. Under this assumption, 
the propensity functions are of the form 

d 

Xr(x, 0^ = 6j. ‘ b^{x) = 6^ ■ > 0 } 

i—l ^ ' 

( 1 ) 

where is the number of molecules of species i 
required for reaction r to fire. Mass action kinet¬ 
ics assumes the system is well-mixed, so molecular 
interactions are proportional to their counts. 

From the propensity functions Xr{x]6), one can 
construct the infinitesimal generator Q = Q{6) of 
the Markov chain. Viewed as an operator on func¬ 
tions / of the state-space M., we have 

M 

iQf) (a;) = 

r— 1 

For finite state-space Xi (bounded molecule counts), 
one may also view the generator Q{6) as a matrix. 
While the state-space is typically intractably large, 
the generator is sparse with only M -\- 1 non-zero 
entries in each row. 


QiO) 

/ 

V 


X + C,2 


X2{x\9) 


X ... x + C,i 


-Xo(x;9) ... Ai(x;0) 


where Ao(a:; 9) = ^r{x] 9). 

Writing Rr{t) to be the counting process repre¬ 
senting how many reactions of type r have fired 
by the time t, we have that X{t) = X(0) -b 
^r{t)C,r■ Using the random time change 

representation^®’^®, we write X{t) as 


X{t) = X{0) + J2Yr(^J^ A,(A(s);6/)ds^ Cr 


( 2 ) 


where W(‘) are independent unit-rate Poisson pro¬ 
cesses. This representation is tremendously useful 
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in conducting analysis of the trajectories. In partic¬ 
ular, it leads to formulations of the Next-Reaction 
Method^^’^® and interpreting simulated trajectories 
in the path-space to allow for coupling paths^®’^®’®° 
as well as path-wise differentiations®’^^. 

When simulating exact trajectories (using any ex¬ 
act method; Direct SSA, Next-Reaction, etc), the 
propensity functions Xr{x; 9) probabilistically deter¬ 
mine both the time between reactions At as well as 
the next reaction r* to fire. The likelihood that re¬ 
action Tfc is the next to fire is proportional to its 
propensity Afc(a;;0); i.e. f‘e,x{r* = Tk} oc Xkix\ 6 ). 
The time between reactions has an exponential dis¬ 
tribution with the rate Ao(x;0) = ^r{x',6) ; 

i.e. At ~ £xp {Xo{x- 0)) with the mean E 3 ,^e{At} = 
l/Ao(x, 0 ). 

Multi-scale dynamics occur when the propen¬ 
sity functions have large magnitude disparities. If 
Afc(x; 0) » Aj(x; 9) for all j ^ fc, then P{r* = r^} Ri 
1 and At ~ Exp {Xo{x;9)) fv £xp{Xk{x\9)). Thus, 
with a high probability the next reaction in an ex¬ 
act trajectory will be and the time clock will ad¬ 
vance on the order of \/Xk{x\9). Such multi-scale 
networks then require an enormous number of com¬ 
putations to sample “slow” reactions and reach the 
required time horizon for the entire system to relax. 


B. Two-Time-Scale Reaction Networks 

We now consider reaction networks with two scales 
of dynamics. For further motivation and discussion 
of reaction networks with multiple time-scales, we 
refer readers to Refs. 11-14 and references therein. 
We instead focus on our formulation for the separa¬ 
tion of time-scales and the averaged process via the 
partitioning of the state space into “fast-classes”. 
Though analogous to the techniques of transform¬ 
ing the species variables into auxiliary fast/slow 
variables^^’^^ or projecting to remainder spaces^'’’, 
the direct partitioning of the state space will allow 
us to construct a singular perturbation expansion 
of the probability measure and establish the rate of 
convergence of such averaging methods. In addition, 
it provides a framework for applying Likelihood Ra¬ 
tio type sensitivity estimates to the averaged process 
as we shall see in the sequel. 

Here, we assume that the disparity in the propen¬ 
sity functions results from magnitude disparities in 
the reaction parameters Oj.. In order to illustrate the 
stiffness, we consider the reaction network 

0‘1/s Pi a, 

* ^ A A^B 

OL2/e P2 


where e <C 1 is a measure of the scale dispar¬ 
ity (stiffness) between the fast reaction parame¬ 
ters ct = [ai,a 2 ] and the slow reaction parameters 
/3 = [Pi, 132 , f^s]- As the stiffness parameter e —>• 0 
the fast reactions a dominate the system, resulting 
in the multi-scale computational problem described 
above. 

In general, suppose a reaction network has species 
[Ai, A 2 ,...,, Xd[ and reactions ri,..., tm ■ We shall 
assume that the propensity functions Xr{x;9) are 
of the form (I) (mass-action kinetics, though other 
forms may also be treated), and that each reaction 
is indexed by its own reaction parameter Or- As in 
the illustrating example, we assume that there is a 
scale disparity in the reaction parameters between a 
set of “fast reactions” and a set of “slow reactions”. 
Thus we can write 9 = [ 61 ,... ,9m] = [oi/e,(3], 
where /3 = [Pi, P 2 , ■ ■ ■, PmP\ are the slow reaction 
parameters, e <C 1 is the stiffness parameter, and 
OL = [ai,... ,aM;] are the underlying (rescaled) re¬ 
action parameters for the fast reactions. 

To ease referencing, we will often index reactions 
and propensity functions directly by their reaction 
parameter. E.g., is the reaction with reaction 
parameter Pi and propensity function Xp^ {x; 9) = 
XpPx-,f3) = PibpPx) (where is given by (I)). For 
the fast reactions ai, we use A^. (x; 9) = A® . {x; a) = 
{ai/e)baPx) to denote the exact propensity function 
and Xai (x; 9) = ai ba^ (x) to denote the rescaled 
version. 

Let X^{t) denote the Markov chain determined by 
the exact propensity functions X%{x; 9) and Xp{x; 9). 
We can write the generator = Q‘^{a,(3) of the 
exact process as before, and observe that 


M 

{Q‘'f){x) = Xr{x; 9) (/(x -b Cr) - f{.x)) 

r—1 


Mf 

= -^Ac,(x;Q:)(/(x-bCa.)-/(a:)) 

Ms 

+ Y ( 2 ;; /3) (/(x -b Cft) - fix)) 


-Qia) + QiP) 


f (x) 


(3) 


where Q(/3) is the generator of the chain under only 
the slow dynamics (determined by slow reactions 
rp), and Q(a) is the generator of the chain under 
only the fast dynamics (with the rescaled propensity 
functions Xa(x;a)). Thus we have a decomposition 
of the generator into the fast and slow dynamics. 
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Q’^{a,f3) = {l/e)Q{a) + Q{(3) One can also view 
the generator Q’^{a,/3) = [\/e)Q[a) + Q(/3) as a 
matrix. In this case, we can write the element cor¬ 
responding to the rate of transition from state x to 
state y as 


Q(a) 


= 


Ol^CX. 

X 

a:ra{x)=y 


x = y 


x^y 


and similarly for 


Q{f3) 




As £ —>• 0 only the fast reactions Tq, fire, and so 
we define an equivalence relation on states s € M, 
by Si O Sj if they are mutually accessible through 
only fast reactions. This defines a partition of the 
state space M into “fast-classes” Mk which are by 
construction the invariant (irreducible) classes of Ai 
under Q{a)-, e.g. 


diag[(5(^^(Q:), Q(^)(q:), ..., is block- 

diagonal. Here, one can view the generators 
Q^^\a.) as the restriction of Q{oi.) to the (irre¬ 
ducible) fast-class Mk (fast-only dynamics when 
A'(O) G Mk)- In light of the finite state-space and 
positive recurrence, each Q^^^(a) is ergodic and 
has a stationary (steady-state) probability measure 
^(k) ^ such that = 0 (with 

interpreted as row vectors). 

Using the above formulation, we can restate the 
averaging principle®^^^ as follows. For small e and 
A'(O) G Mk, Ar^(-) will relax to its steady-state 
distribution on the micro time-scale et before 
any slow reaction fires (on the macro time scale t). 
Thus, one can use the stationary average of the slow 
propensity functions 

X0.{X it)-,f3,X{0)GMk) 

~ Aft iMk-,e) ^ {Aft iX;f3)} (4) 


Nc 

M=[jMk = [ 


Ji) ..( 1 ) 


( 1 ) M) 




^( 2 ) 
• ■ • 5 *^7712 


k^l 




where Nc are the number of invariant “fast-classes”, 
and TOfc = |A4fc| is the number of states inside fast- 
class Mk- For ease of presentation, in the present 
discussion we shall assume the state space is finite. 

Assumption II. 1 (Finite State Space). The state 
space M is finite, such that \M\ = m. Thus the 
number of fast classes Nc < oo and the number of 
states in each fast-class mk < oo, so that m = mi -|- 
7712 + • • • + - 

Assumption II. 1 is made only to simplify the dis¬ 
cussion. One may also treat the infinite state case 
with some mild additional conditions on Q{a) and 
Q{(3) to ensure non-explosiveness and ergodicity of 
the underlying (rescaled) chain^°. In addition, we 
shall impose the following assumption. 

Assumption II. 2 (Recurrent States). Each state 
of M is recurrent, so that there are no absorb¬ 
ing/transient states. 


to determine the distribution of time until the next 
slow reaction as well as the probabilities for the next 
slow reaction being rft. These can then be used 
to simulate a trajectory of the slow (macro-scale) 
process. We shall further develop this idea more 
precisely in the remainder. 

Write m = |A4|, and mk = |AIfe| as 

before, so that Q^{6),Q{ol),Q{(3) G 
QW(a) e and 77 ^( 0 :) e 

7 r(a) = diag ..., G R^cxm^ 

Write Imi: for [1,1,...,!]' e ^^d 1 = 


diag lmi5 lm 2 : - - - , 1-mAr 

With 77(0:) describing the limit behavior inside 
each fast-class on the micro time scale, one can then 
consider the distribution of the exact system X’^ on 
the macro time scale. Heuristically, one expects a 
trajectory to enter a fast-class of states Mki and 
quickly iterate through many fast reactions until 
the distribution of the trajectory reaches the steady- 
state Eventually, a slow reaction will fire to 

move the trajectory to a new fast-class Mk 2 (see 
Figure 1). Indeed, Writing 


Assumption II.2 is satisfied if, for example, all re¬ 
actions are reversible (or often times if only the fast 
reactions are reversible). One may also treat the 
case with transient/absorbing classes with some ad¬ 
ditional stability assumptions to ensure the fast dy¬ 
namics decay to steady-state; see Section 4.4 of Ref. 
20 for more details. 

Under Assumption II.2, we can re¬ 
order the state space so that Q{a) = 


Q = Q{a,f3) = 


7r{a) - Q{f3) ■ 1 


e R^cxNc^ 


(5) 


we see that Q is itself a generator of an “averaged” 
CTMC reaction network, whose “states” correspond 
to fast-classes Mk- Write X{t) for the “averaged” 
process generated by Q. Together, Q and X{t) de¬ 
scribe the limit (as £ —>■ 0) of the average rate that 
the exact process X^it) moves between the fast- 
classes {Mk}k=i via slow reactions. 
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FIG. 1: A depiction of the macro chain of a 
Two-Time-Scale Reaction Network. Each fast-class 
Alfe is an equivalence class of states accessible by 
fast-reactions Tq. Slow reactions rp carry each 
fast-class to a unique new fast-class. The averaged 
system X{t) forms a meta “macro” Markov chain 
among fast-classes with propensities 
A/3 


Furthermore, we can identify the elements of Q 
from the steady-state averages of the slow propensity 
functions. First, note that every slow reaction car¬ 
ries each fast class to a unique new fast-class; that is, 
if X -O' and \p{x),\p{y) > 0 , then rp{x) rp{y). 
Thus, rp{Mk) is well-defined. Then, using the form 
of Q together with (5), we have 

Qk^M= A/3(AIfci;0) 

/3e/3 (6) 

for ki ^ ^ 2 , and similarly we see that 

QkkM = -llMMk,-,e)^-Xp,{Mk,-,e). (7) 
/3e/3 

With this formulation, we see that generator Q cor¬ 
responds to a meta “macro” reaction network with 
the state-space M. = {AIi, AI 2 , ■ • ■, AIatc}, reac¬ 
tions {rp : ^ € /3} and propensities {Xp{X]0) : P € 
/3}. Figure 1 depicts such a macro chain for the 
macro process X(t). 

If we can estimate the average slow-propensities 

Xp{Mk',d) within each fast-class (say, through er- 
godic time averages of the fast-only process), then 
one can simulate a trajectory of the macro pro¬ 
cess X (t) from these average propensities using any 
single-scale Monte Carlo simulation (e.g. Direct 
SSA, Next-Reaction, etc). Furthermore, if one is ul¬ 
timately concerned with estimating Eg {f {X^{t))} 
for some observable (quantity of interest) f : M ^ 
R, then one can define an augmented functional / 


/(Alfc;a)^E^(.,(„){/(A)}= ^ f{x)ni^Hc^) 

xGMk 

( 8 ) 

It shall be shown that Ee{/(A(t))} Ri 
Eg{f{X^{t))} for large enough t and sufficiently 
small e. 

To illustrate how one can implement the averaging 
scheme to generate macro-trajectories, we present 
the following TTS vlersion of the Direct SSA (since it 
is the most succinct to write). In this case, the TTS 
SSA is essentially the same algorithm as in Refs. 11, 
9. However, we emphasize that the same method 
can be used to create a TTS version of any exact 
method defined by the propensity functions. In par¬ 
ticular, one can just as easily construct an analogous 
TTS Next-Reaction type algorithm^®^^'’ for tightly 
coupled trajectories. 

Algorithm 1 (TTS-SSA). To simulate a trajectory 
of the macro-process X{T) until macro time-horizon 

Tfinal • 

( 1 ) Initialize x at a macro time T; x € for some 
(unknown) k 

(2) Simulate the fast-only reaction network (a) 
until time-averages of observable / and slow 
propensities Xp relax to steady-state : 

jJ^f (xW(s)) ds ^ Es(.)(„) {/(A)} = 7(fc) 

i Xp 0 ) ds ^ E^(.)(„) {Xp (a; fd) } 

= Xp{k-,e) 

(3) Observe terminal state ~ Compute 

(4) Sample time to next slow reaction : AT ^ 
£xp(Xp„) 

(5) Sample next slow rxn to fire P* ^ 
1/A/3o [A/3i ! • ■ • 5 

( 6 ) Update macro time T ^ T -|- AT and move to 
the next fast class by taking x = x^^^ -I- C/ 3 * 

(7) Return to (1) until macro time horizon Tjinai 
is reached 
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C. Convergence and Error Bounds 

Here we use the above formulation of stiff net¬ 
works to establish convergence results and error 
bounds for the averaged process obtained by Algo¬ 
rithm 1. They are largely obtained by applying re¬ 
sults from Ref. 20 to the Two-Time-Scale Markov 
chain developed above. We give the statements be¬ 
low and defer to the Appendix for the proofs. 

From the exact chain define a stochas¬ 
tic process taking values in Ad = 

{Mi,...,Mnc} by X^{t) = Mk for G Mk- 

Note that X^{t) is not, in general, a Markov chain. 
However, one expects that as £ —>■ 0, the process 
X‘^{t) converges to X{t) in some sense. 

Proposition II.3 (Weak Convergence). Under As¬ 
sumptions II.2 and II.1, as £ —>■ 0, X'^(-) converges 
weakly to X{-) in the Skorohod space iD([0,T];At) 
for any time horizon T. 

The above proposition establishes weak conver¬ 
gence of the projection (onto fast-classes) of the ex¬ 
act system X^ to the averaged meta system X as 
£ —>■ 0. This is essentially the same result as estab¬ 
lished in Ref. 13, where the authors instead consider 
the disparity of the propensities as the system size 
(molecule count) N —> oo. In both formulations, 
one selects a reference scale and then examines limit 
behavior against the reference scale as the disparity 
increases (£ —^ 0 or iV —^ oo). However, in practice 
one implements the averaging procedure to approx¬ 
imate a system with a fixed, positive scale disparity. 
Naturally, one is then concerned about the induced 
error from the averaging approximation. 

Write pj. = prp{X;a., (3) for the probability mea¬ 
sure (on M) induced by the averaged process X at 
time T. At the end of a TTS simulation, one obtains 
a terminal state X{T) = x ^ pf = pf{X-,a,f3), 
where p^ is the probability measure on Ai induced 
by the last state observed from the terminal fast- 
class. Thus, py is determined by py and 7f(Q:). Write 
Pt — CK, (3) for the probability measure on M. 

induced from the exact process X^. Since is the 
distribution we would see from an exact simulation, 
and py is the distribution from the TTS simulation, 
the question becomes: What is the error of p^ from 
p|.? One can take a singular perturbation expansion 
of py in terms of e and identify the leading term as 
p^ to obtain the following result. 

Theorem II.4 (Error in Probability). Let Ik = 
— i max < Re(i^) : v is a non-zero 


Then under Assumptions II. 1, II.2, we have 

IIpt-PtII < 0(£-f exp{-KT/£}) (9) 

where || • || denotes the I 2 norm. 

In Theorem H.4, k is the slowest rate of conver¬ 
gence of to the steady-state among all fast- 
classes Aik- Thus, as long as the macro time horizon 
T is large enough to ensure the fast dynamics have 
relaxed to steady state (T > —£/Klog(£)), then the 
error becomes \\pj. — p^\\ < 0 {£). 

Writing 7r°(Q:,/3) for the stationary distribu¬ 
tion corresponding to the TTS probability measure 
p^{a,f3), it is not hard to see that Tr^{a,f3) = tt-^, 
the product of the steady-state distribution between 
fast-classes 7 r{a., (3) and the steady-state distribution 
within fast-classes Tr{a). Write tt^ for the steady- 
state distribution corresponding to the exact process 
generated by Q^. Then using Theorem H.4 and ex¬ 
ponential convergence to the steady state, we obtain 
the following error bounds. 

Corollary II.5 (Error in Expectation). Under As¬ 
sumptions II. 1 and II. 2, ||7r*^ — 7r®|| < 0(e) and 
— Pyll < 0{e) for sufficiently large T. Thus, 
for all bounded functions f on the state space A4, 

{7(X(r))} - {f(X^(T))}\ < 

II/I|oo||p7-p^II<0(^) 

\E^{7(X)} -E^e{f{X^)}\< 

||/||oo|k°-^1|<0(£) 

Corollary H.5 is of great practical use, as it says 
that the expected value of the macro-process X{T) 
with macro-observable f : Ai ^ M. provides an 0(e) 
estimate of the expected value of the exact system 
X®(r) with observable / : A4 —>■ R. Since we can 
use TTS algorithms (such as Algorithm 1) to quickly 
generate trajectories of X(T) while estimating the 
macro-observable f(Aik) at each state along the 
way, this provides a method to very quickly gener¬ 
ate estimates of Ep|, {f(X^(T))} with at most 0(e) 
bias. As £ —>■ 0, the bias decreases linearly while the 
computational savings increase as 0(l/£). 

III. TWO TIME SCALE SENSITIVITY ANALYSIS 

Computing the system sensitivities Sf^T(Si) = 
^Ee{/(X(T))} with respect to reaction parame¬ 
ters 9i G 9 provides great insight into the model. 
As such, numerous works have constructed and an¬ 
alyzed methods to estimate the sensitivities from 
■sample trajectories of the 


eigenvalue of a j. 


6 


Different methods work better for different sys¬ 
tems or different criteria, but all methods have 
higher (sometimes stupendously higher) variance in 
the estimation of Sf_T{di) compared to the estima¬ 
tion of Ee {f{X{T))}, thus requiring a very large 
number of samples to estimate the sensitivity pre¬ 
cisely. If the system is stiff (as in (3)) so that 
each exact trajectory X^{T) requires a prohibitively 
large computational load, then the large number of 
sample paths required to estimate the sensitivity 
{/(^^('T))} make the problem 
computationally intractable. 

Corollary II.5 gives that the expectation of macro 
“averaged” reaction network f{X(T)) gives an accu¬ 
rate approximation of the expectation of the exact 
network; Ep^(e) {J(X{T))} = Epe,(e) {f{X^{T))} + 
0{e). A natural question to ask is whether the sensi¬ 
tivities of the exact system converge to the sensitiv¬ 
ities of the averaged system. Using the recent result 
of Ref. 14, we can derive the following (the details 
are deferred to Appendix). 

Proposition III. 1. Convergence of Sensitivities 

= Sf,Tm = Ui^iT))} 

( 11 ) 

Thus, if we can compute the sensitivity of the 
macro reaction network f{X{T)) (whose sample 
paths have orders of magnitude less cost to simulate 
than the exact stiff network /(X'^(T))), then this 
provides an accurate estimate of the exact sensitiv¬ 
ity. Furthermore, since X{-) is formulated as a re¬ 
action network with propensities {A/ 3 (a;, 0) : (3 € f3} 
and observable values f{Aik) (both of which are esti¬ 
mated during a TTS simulation), we can apply most 
of existing single-scale sensitivity estimation meth¬ 
ods to estimate Sf,T{di) and thus S‘jrp[9i). 

We note that (11) gives that the sensitivity of the 
exact system converges to the sensitivity of the av¬ 
eraged system, but does not give the rate of conver¬ 
gence. Currently, this is an open question. Since 
from (10) we have the expectation converges at a 
rate 0 {e), one might suspect that the sensitivity also 
converges at rate 0(e), at least for certain classes of 
networks (e.g. linear propensities). Ongoing work 
aims to establish the rate of convergence via singu¬ 
lar perturbation expansions of sensitivity reweight¬ 
ing measures. However, the remainder of this work 
shall focus on the development and practical imple¬ 
mentation of a multiscale Likelihood Ratio estimator 
of the limit sensitivity Sf_T{9i). 

In what follows, we review the Likelihood Ratio 


method for computing system sensitivities for single¬ 
scale reaction networks. Furthermore, we shall intro¬ 
duce a new Ergodic Likelihood Ratio method which 
has much smaller variance when estimating sensitiv¬ 
ities at steady-state. We then derive a Two-Time- 
Scale version that allows one to estimate the full gra¬ 
dient of a stiff system using any TTS Monte Carlo 
method for simulating a macro trajectory. 


A. Likelihood Ratio Methods 


Likelihood Ratio (LR) methods^®’^^^^''’’^^ (aka the 
Girsanov Transform Method) attempt to compute 
the derivative by reweighting the observed trajec¬ 
tory by its “score” function of the density. Here, one 
views 0 as parametrizing the probability measure on 
the path-space P(-, t; 0). If P(-, t; 0) is differentiable 
with respect to 9i, then under mild regularity con¬ 
ditions we have 


Sf,t{9.) = {fixm 


= [ /(X(t,w)) 
Jn 


J-LoP(dw,t;0) 


P{dw,t;e°) 


P{du!, t; 9) 

= E^o {fiXit))WeM} ( 12 ) 


Using the random-time-change representation (2) it 
can be shown^^ that the reweighting process We^ (t) 
is a zero-mean martingale process and can be repre¬ 
sented by 





Ar (X(s-),9°) 


dRr(s) 



(X(s-),9°) ds, 


(13) 


where dRr{s) is simply the counting measure of re¬ 
action r which equals 1 at times s at which reaction 
r fires and is zero otherwise. Thus, assuming one 
can compute Arix, 9), then We^it) has a compu¬ 
tationally tractable form as follows. 

We write Xi for the l-th state of the system 
through a trajectory, and A; for the holding time in 
the l-th state. Write Ti for the time of the I jump, 
so that Ti = ■ We denote N (t) as the total 

number of reactions which have fired by time t and 
r* for the index of the reaction which takes the sys¬ 
tem from the l-th state to the I -\- l-th state. Then 
Wg. has the explicit form 
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WbM = 

N{t)-i r 

F — 


M „ 


r—1 


M 

[T-TNiT)]- 


In simulation, the LR estimate is computed via 
ensemble averages estimated by empirical averaging 
SfT{9i) Ri LR(iVs,0i) with the empirical estimator 


Ns 

i^{Ns,e.) ^ ^ g [/^))] (14) 

n—1 

where Ns is the number sample paths, [f{x(T))]^ is 
the observable value at terminaRtime T for the nth 
sample path, and similarly [Wg^ (2^)],! i® terminal 
value of Wg. (T) for the nth sample path. While the 
reweighting process Wg^ (<) has zero mean, its vari¬ 
ance grows with time^®’^^, making it quite inefficient 
for large time horizons. The variance can be reduced 
by using the centered likelihood ratio estimate 

CLR(iVs,0,) = LR(^s,0z) 

-^{s:i/wb)i.}|i:i'bR)L}^ (15) 

Since the 'Ee{Wg.{T)} = 0, the second term doesn’t 
impose any bias into the estimate (12), but is 
coupled to the first term to reduce the observed 
variance^®. 

Suppose one is interested in the steady-state sen¬ 
sitivities, Sf^ooiOz) = dg^E^(^g){f{X)}. It is well 
known that Ep^(^g) {f{X{T))} = E^(e) {/(A)} -f 
O (e”'^^) for some mixing rate k, and thus for 
large T one can use the terminal distribution of 
f{X{T)) and Wg. (T) in (12) to obtain an estimate of 
the steady-state sensitivity with exponentially small 
bias^®. However, the major difficulty in using like¬ 
lihood ratio estimates is the large variance of the 
estimator f{X{T))Wg.(T), which is proportional to 
Var{/(A(T))}Var{Wei(T)®®>^^. It can be seen that 
Var {He- (T)} = 0{T), so one must balance choosing 
a terminal time T large enough to ensure sufficient 
decay of the bias {/(A(T))} - {/(A)}, 

yet as small as possible to contain the growth of the 
Var{lTg. (T)}. While centering as in (15) helps to 
reduce the variance of the estimator, the variance 


is usually much larger than comparable finite differ¬ 
ence of pathwise derivative methods®®^^®. 

Instead of using the terminal distribution / (A(T)) 
as an approximation of the steady-state distribution, 
one could instead use the ergodic-average (time- 
average) 1/T Jg f{X{s))ds. The bias of the ergodic- 
average decays slower than the terminal distribu¬ 
tion (0(l/r) compared to 0(e“"’^)), but has the 
advantage that variance decays with time as well; 

that is, Var |l/r jJ"/(A(s))ds| = 0(l/r) whereas 

Var{/(A(T))} ^ Var{/(A(oo))} = (see Ref. 
34 for more details). 

Motivated by the variance reduction one obtains 
with ergodic averaging, we introduce a new method 
for computing likelihood-ratio type steady state sen¬ 
sitivity estimates. The idea is to simply replace 
the terminal-state observable /(A(T)) with the er¬ 
godic average l/T f(X(s))ds in the LR scheme 
(12) - (14). The philosophy is that by incurring 
some small amount of additional bias in the mean 
value, the ergodic steady-state sensitivity estimate 
has much smaller variance than the terminal-state 
distribution. We shall refer to this method the er¬ 
godic likelihood ratio, 


ELR(As,0,)^ 


1 



f{x{s))ds 


[WgAT)]r.- 


(16) 


Similarly, one can center the ELR to derive the cen¬ 
tered ergodic likelihood ratio CELR, 


CELR(As, e,) = ELR(As, 0,) 



Ns ) 

E ■ 

n^l J 

(17) 


In the numerical experiments, it shall be seen that 
the CELR method performs much better than the 
CLR for steady-state sensitivity estimation. 


B. TTS Likelihood Ratio 

In what follows, we describe how the above single¬ 
scale Likelihood Ratio methods can be adapted to 
the macro-process X{T) for use in (11). Recall 
that the reaction parameters can be classified as 
fast or slow, 6 = [ct, /3] with a = [ai,..., ccm/] and 










(3 = [(3i,..., i3mJ- To apply Likelihood Ratio meth¬ 
ods to compute SgiEe {/(^(T)}, we exploit that the 
macro process X{T) is identified as reaction network 
with propensities 

^i3riX;a,f3) = 

x^A4-^ 


(for Pr & A), and observable 

7(X;a)=E-(x)(„){/(X)}= ^ fix)n^^\x;a) 
Thus the macro-sensitivities can be represented by 


^^Ao){f{XiT);0)} 

= ^AO) {^7 {X{T);e) + 7 (X; 0) ■ir,,(T)| 

(18) 


where the macro-reweighting process VLe^(T) is 
given by 


Ms 




»T _£ 


(X(s);0) 


Ms „T 

do. 


dRpr (s) 


-E/ ■§fr,^Pr{ns)-,0)ds. (19) 


Therefore, in order to apply (18) we need to be able 
to compute the derivatives of the averaged observ¬ 
able f (^(s); 0) as well as the derivatives of the 
averaged propensity functions Xp^ (X(s);d). 

Suppose 0i = £ /3 is a slow reaction parame¬ 

ter. If the original observable f(X) has no direct pa¬ 
rameter dependence, then 9^. f(X{s),0) = 0. Fur¬ 
thermore, under mass-action kinetics, the averaged 
propensities have ^p^ Xp^{X{s);6) = bp^{X] OL)Si^r, 
where hpAX\ a) = l//3r ■ Xp^{X; a) is already com¬ 
puted during a TTS simulation and Si^r = 1 ii i = r 
and 0 otherwise. Thus the slow sensitivities are di¬ 
rectly computable from a standard TTS simulation. 

Suppose 9i = ai £ a is a fast reaction 
parameter. Then computing 9a^/(X(s); a) and 
9aiXp^{X(s); a, /3) is more problematic, as they only 
depend indirectly on a through the fast-class steady- 
state measures n(a). Thus explicit computation 
is often infeasible. However, one may estimate 
{/(^)} and 9a, E~(y)(„) {A/j^(X);/3} 


through any sensitivity analysis method from a sim¬ 
ulation with only fast reactions. For example, when 
running the fast-only simulation (under Q^^Aa)) 
for equilibration in Algorithm 1, one can compute 

the corresponding likelihood ratio process WA {t) 
as in (13) (with t large enough so that p) 7°^) ~ 
7f*^^7o:))- Then one can estimate the derivitives in 

(18) , (19) by 

E7(X;a) E_,x, {f{Xit))Wo,At)} 

^XpAX;e) ^ E_,x, {xpAxmWo^At)}, 

using the proposed CELR method (17) during the 
micro-equilibration computation. Plugging these es¬ 
timated values into (19) allows one to calculate Wat 
for each macro-trajectory, which in turn allows for 
sensitivity estimation with respect to ai in (18). 

We note that our derivation leads to a different 
form of the multi-scale LR estimator compared with 
Ref. 21. The latter estimated the reweighting mea¬ 
sures for the exact process by adding to¬ 

gether the micro reweighting measures Wq,, from 
within each fast class visited, the idea being that 
Wcii (t) is a zero-mean martingale which adds no 
new information and only increases in variance once 
the fast-only process has converged to steady-state. 
Henceforth, we refer to this approach as the “Trun¬ 
cated Likelihood Ratio”, as it approximates the ex¬ 
act reweighting coefficient W^{t) via a truncated 
observation within each fast-class. Conversely, the 
TTS Likelihood Ratio uses the exact representation 

(19) for the macro process, and then estimates the 
terms via (20) 

Lastly, we note that the above procedure will esti¬ 
mate sensitivities with respect to the parameter set 
0 = [a;,/3]. However, the original goal was to esti¬ 
mate sensitivities with respect to 0^ = [a/e,/?] = 
[a^,/3]. The sensitivities of the slow parameters f3 
are the same, but the TTS Sensitivity scheme com¬ 
putes fast sensitivities against the rescaled param¬ 
eter ai rather than against the original parameter 
af = tti/e. However, it can be shown (Appendix A) 
that at steady-state we have 

^E,e {/(X; 0")} = {f{X; 0")} . (21) 

Therefore, by multiplying the TTS sensitivity esti¬ 
mate (against ai) by a factor of e, one thus obtains 
the estimate against the original parameter af. Thus 
one can use the TTS scheme to estimate the full gra¬ 
dient of VflsE^r' {f{X-,9^)}. 
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IV. BATCH-MEANS STOPPING RULE 


A crucial question when implementing a TTS sim¬ 
ulation is: How long to run the micro-equilibration 
for? That is, how large a value of t does one use to 
compute the ergodic averages 

7{X;e) jy(^X<X\s); 9 ) ds 

for a desired level of accuracy 6? Taking too small 
a value for t risks imposing a large bias. However, 
the 0{l/t) rate of convergence for the ergodic aver¬ 
age implies almost nothing is gained by integrating t 
past the relaxation time of the system. Furthermore, 
when computing the micro-sensitivities one uses the 
micro-reweighting process Wa. (t) by 

{V;e) {a« (Xity.p) w„,(i)}, 

where the variance of Wa^ (t) increases with the time- 
horizon t. Thus, we would ideally take the smallest 
value of t such that How¬ 

ever, different fast-classes X can have vastly differ¬ 
ent sizes. This can result in significantly different 
relaxation times for each class. It is then ideal to 
have an adaptive stopping rule which terminates 
the micro (fast-only) simulations when the ergodic 
averages have converged to the steady state mean. 

Current implementations of an “averaged” or 
“multi-scale” SSA use a constant relaxation time tf 
for the micro-averaging step^^’^^ whose choice is mo¬ 
tivated by some a priori insight into the system. In 
Refs. 10, 35 the authors also use a fixed time t, but 
then exploit algebraic relations of the steady-state 
means to try to obtain better approximations. In 
Ref. 9, a stopping rule is developed which deter¬ 
mines that equilibrium is reached when the aver¬ 
aged values of the propensities of the forward and 
backward reactions are approximately equal for each 
reaction pair. However, experience has shown this 
“partial-equilibrium” stopping rule can stop pre¬ 
maturely (in the transient regime) with significant 
probability for systems with relatively few reaction- 
pairs. Thus, we seek to obtain a robust, adaptive 
stopping rule for terminating the micro-equilibration 
simulation. 


A. Batch-Means for Steady-State Estimation 


The problem at hand is really one about Markov 
chain mixing-times and the integrated autocorrela¬ 
tion time Tint- Analytically, the mixing and inte¬ 
grated autocorrelation times are related to the spec¬ 
tral gap of the underlying generator^®^^®. Unfortu¬ 
nately, for large systems direct computation is usu¬ 
ally infeasible. Some common approaches involve es¬ 
timation autocorrelation function A(t) of the process 
and then exploiting the relation Tint = 2 /q A(t)dt 
to derive estimates of Tint from the estimates of 
^(^)37,39,4o_ However, if the goal is to terminate the 
simulation when the ergodic average has converged 
appropriately, then these methods are indirect and 
can be computationally intensive. Another common 
approach is to exploit the regenerative structure of 
Markov chains^^ to obtain independent and identi¬ 
cally distributed samples of the process and obtain 
confidence bounds on the ergodic average. However, 
these methods can be inefficient for complex systems 
where the return time to the initial state can be quite 
large. 

We instead turn to the method of batch means®'^’"^^ 
for determining confidence bounds (and thus a mea¬ 
sure of convergence) for the steady-state estimation 
problem inside each fast-class. The use of batch 
means is applicable to a wide range of problems 
(any which satisfy a central limit theorem), and its 
implementation is very straightforward. For a gen¬ 
eral Markov chain X(s) with an observable function 
/, write U(s) = f{X{s)) and Y{t) = l/t fgY(s)ds 
. We denote = E^{/(W)} for the steady-state 
value we wish to estimate. Then under some gen¬ 
eral conditions®® U(s) satisfies a functional Central 
Limit Theorem: 

- U}t>o 

in the sense of weak convergence as e —>■ 0. 

Suppose that t > N{,Treiax, where Treiax is the 
relaxation time of the system and Ny, is a number 
of “batches” (bins) to partition the trajectory into. 
Then the batch means 


Y,{t) ^ 


1 

Wb 


j-kt/Nb 

l{k-l)t/Nt 


Y{s)ds 


are approximately (as t —>■ 00 ) independent and 
identically distributed samples of Af (/tt , Nb/t). 
Thus 




Y{t) - U 
SN^t) 


V 


Tn^-i 
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as t —>■ oo, where is the Student’s t- 

distribution and (t) is the sample variance among 
batches, 

Nb 

k—1 

Thus, for t sufficiently large, a (1 — Sci)W0% con¬ 
fidence interval for the value of /,r is given by 

Y{t) ± MOE {t,Nb,6ci), 

MOE {t, Nb,6ci) = {tquantile) 

V-'^h 

( 22 ) 

and tquantile IS the (1 — i5c//2)th quantile of the Stu¬ 
dent’s t-distribution with Nb — 1 degrees of freedom. 

The usual perspective for applying batch means 
is that one has a fixed set of data {T(s) : s € [0,t]} 
to partition, and then must choose the number of 
batches Nb appropriately so that each batch length 
t/Nb is long enough so that the batch mean er¬ 
rors [yfc(t) —y(t)] are approximately independent, 
identically distributed, and Gaussian. One then 
often chooses Nb to be relatively small (say, 5 to 
30)37,43 ensure the independent and Gaussian 
assumptions hold. When viewing the asymptotic 
structure as the amount of data t grows, then one 
can ensure that the asymptotic central limit theorem 
holds if the number of batches grows as Nb{t) ~ v^. 
In Ref. 42, the authors consider strategies which let 
the number of batches grow if the correlation be¬ 
tween batches is near 0, and otherwise hold Nb{t) 
fixed until the batch correlation decays to 0. 

Since our goal is to simulate only enough (micro¬ 
scale) data so as to determine the steady-state values 
/tt, we instead take the perspective that one has a 
fixed number of batches Nb desired, and that one 
should generate data {T(s) : s € [0,t]} until each of 
the batch means Yk {t) are (approximately) inde¬ 
pendent and identically distributed about /^. For 
a fixed level of precision Spredse, confidence level 
Sc I, and the number of batches (independent sam¬ 
ples) Nb, the Batch-Means Stopping Rule terminates 
the simulation when MOE{t) = MOE{t, Nb, Sci) < 
Spredse, where MOE{t) is defined by (22). Figure 2 
gives a depiction of how the Batch-Means Stopping 
Rule is implemented. 

In addition to giving an on-line estimate of the re¬ 
laxation time of the system, the Batch-Means Stop¬ 
ping Rule gives Nb—1 (nearly) independent sam¬ 
ples of trajectories with initial distribution approxi¬ 
mately equal to the stationary distribution. Further¬ 
more, one can compute the reweighting coefficients 


yxh) 


^^ 3 (^ 1 ) 

YSi) 






yxt2) 

y2(t2) 

V3((2) 

yxt2) 





t2 


FIG. 2: A sketch of the batch-means stopping rule. 

The process is simulated for a fixed number of 
jumps {Nj = 20) to a terminal time ti, and then 
the trajectory is partitioned into A^;, = 4 batches to 
compute the variance between the batch means 
Yk{ti). If the confidence bounds are precise enough 
{MOE(ti) < Spredse), then the simulation is 
terminated and each batch gives an iid sample of 
, tr^). Otherwise, another Nj = 20 jumps are 
simulated and the process is repeated. 


Wk, 0 {t) in each batch to give Nb (nearly) indepen¬ 
dent samples of the steady-state reweighting coeffi¬ 
cients (in a manner similar to the “Time-Averaged 
Gorrelation Function” method of Ref. 23). 

B. Batch-Means Stopping Implementation 

Suppose we have a general reaction network with 
Mr reactions and Mg reaction parameters. Here 
we allow the possibility that Mr ^ Mg for gen¬ 
eral propensity functions Xr{x;9) (e.g. Michaelis- 
Menten kinetics), whose parameter derivatives 
dg^Xr{x', 0 ) we can compute explicitly for all i = 
1,..., Mg and all r = 1,..., Mr- Denote by (^r the 
stoichiometric vector for the rth reaction. Our goal 
is to estimate the gradient V6iE,r(e) {/(A)} for some 
observable function /. 

We introduce the following notation for the batch- 
means stopping rule. X[n) is the nth state of the re¬ 
action network, T(n) is the time of the nth jump and 
/(n) = f{X{n)) for the value of the observable at 
the nth state. E{n) = f (A(s)) ds is the time- 
integrated value of / up to the nth jump, r*(n) is 
the reaction which fires at jump n (taking the system 
from X{n) to X{n + 1)), is the vector in 
with 1 in the ith component and zeros elsewhere. 
R[n,9) £ and B[n,9) £ are the first 

and second terms of (13) with respect to each of the 
parameters 9i & 9 [i = 1,..., Mg). Ng is the num¬ 
ber of batches (approximately independent samples) 
to be used, Sci is the desired confidence level (for 
a (1 — dc'/)100% confidence interval, and Spredse is 
the maximum allowed radius of the confidence in- 
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terval at the stopping time. Nj is the number of 
jumps to simulate before retesting the batches for 
convergence. Then one can write the batch-means 
stopping rule as follows. 

Algorithm 2 (Batch-Means Stopping Rule with 
Sensitivity Estimation). 


(1) Initialize 

. A(0) = lo, /(O) = /(a( 0)), r(0) = 0, 

F{0) = 0, R{Q) = [0,...,0] € 

B{0) = [0,...,0] € tests = 0 

(number of times the data has been tested 
for convergence). Calculate tquantUe = 
(1 — Scil‘2) quantile of a Student’s t- 
distribution with Nh—l degrees of freedom. 


(2) Generate and Record Data Simulate Nj 
jumps and record values immediately after each 
jump. For n = Nj-tests ,..., Nj ■ {tests+1) — 1, 


• Compute Xr{X{n);0), ■^Xj.{X{n),0) for 
all r = 1,..., Mr and all 0^ = 1,... Mg. 
Set Xo{X{ny,0) = Ar(A(n), 0), and 
J-Ao(A(n), 0) = ^K{X{n), 0) for 

all 6 i e 0 . 


• Sample At{n) ^ £xp (^Xo{X{n),0)'^, and 

^ - 

Ao(X(n).0) 

X Xi{X{n);0),...,XMr{X{n);0) 


• Update T{n -I- 1) = T{n) + At{n) 
X{n + 1) = X{n) + Cr*{n) 
f{n + l) = f{X{n+l)) 

F{n -b 1) = F{n) + f{n) ■ At{n) 
i?(n + l,0) = i?(n,0)+Efcle^ 


X 

^K+n) (^X{n),0'^ 

/K+n) 

b{r 

.0) = EfJ^e^■ 

E^E4-Xr{X{n),0) 


B{n -1-1,0)= B{n, 0) + b{n, 0) ■ At{n) 
W{n + 1,0) = R{n +1,0)- B{n + 1,0). 


(3) Test Batches for Convergence 


• Nend = Nj - {tests + 1)= index of last avail¬ 
able data point. 

Y = F {Nend) /T{Nend) = total time- 
averaged value 

hatch = T{Nend)/Nb= time length each of 
batch 

Initialize F^ = 0 (total integral through 
end of previous batch). 


• loT k = 1,..., Ng 

- indB{k) = 

max{n : T{n) < k - hatch}=hide'x. 
of the last jump in batch k. 

- F^{k) = F{indB{k)) + f{indB{k)) 

X [k - hatch - T{indB{k))] - F'®=total 
integrated value of f{X{s)) inside 
batch k. 

— Yk = F^{k)/hatch= fcth batch-mean 

- F^ ^ F^ + F^{k) (update integral to 
end of previous batch) 

• \Yk-Y\^ = variance 
between batches 

• MOE = t quantile * iJs%JNb = margin of 
error for confidence interval 

• If MOE < Sprecise, then go to (4). Else, 
tests •<— tests + 1 and go back to (2). 

(4) Compute LR Weights in each batch. 

Initialize W^{0) = [0,..., 0] € For k = 

l,...,Nb, 

• W^{k,0) = W{indB{k),0) 

-b{indB{k), 0) -[k- hatch - T{indB{k))] 

-W^{0) 

• W^{0) ^ tU^(0) -b W^{k, 0) 

(5) Compute Sensitivity Estimates for 

VeE,(e) {/(A)}: 

• Likelihood Ratio 

. Nb 

LR=— ^/(zndB(fc))IUf(fc,0) 


• Centered Likelihood Ratio 


CLR = LR 

Nb 




Nb 


Nb 




(xndB{k)) 

k—1 

• Ergodic Likelihood Ratio 

1 _ 

ELR=^Ew|(fc,0) 

k—1 

• Centered Ergodic Likelihood Ratio 

Nb 


CELR - ELR - 






Nb 


fe=l 
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V. SIMULATION RESULTS 


0.1 


Here we present numerical results to display the 
performance of the proposed algorithms. In what 
follows, we compare the output of an exact simula¬ 
tion at the single time scale (STS) to the accelerated 
two time scale (TTS) approximation. From (11), 
we expect the differences in observable averages and 
their derivatives to be 0(e). Because differences are 
small, we use a simple test system for which many 
samples can be run to obtain accurate statistics. 

Consider a reaction network with species A, B, 
and C and isomerization reactions given by 

A H" B, bH" A, bHc 

For small values of e, the system becomes stiff as the 
isomerization between A and B reaches equilibrium 
much faster than B is converted to C. A TTS ap¬ 
proximation assumes that A ^ B and B ^ A are 
fast and equilibrated. 

We first compare the output of the accelerated 
TTS simulation against the exact STS simulation 
for varying levels of stiffness e. For our simulations, 
initial conditions of (Aq, Bq^Cq) = (100,0,0) and 
the parameters (fci,fc 2 ,fc 3 ) = (1,1.5, 2) are chosen. 
10,000 replicate (independent) trajectories are run 
for various values of e. Statistics are taken at a ter¬ 
mination time of t = 0.5s. Species averages are cal¬ 
culated as arithmetic averages over the independent 
trajectories while sensitivities are computed with the 
CLR method shown in (15). The error due to statis¬ 
tical averaging is estimated using t-test statistics for 
averages and a bootstrapping method for sensitivi¬ 
ties. Sensitivities with respect to the “slow” param¬ 
eter ks are displayed for each species. As discussed 
in previous works^^, sensitivities with respect to pa¬ 
rameters related to fast reactions encounter signifi¬ 
cant noise, and thus we omit them in order to clearly 
observe the difference between STS and TTS. Fig¬ 
ure 3 shows the disparity between the STS and TTS 
systems for various values of e. Errors are normal¬ 
ized by the TTS value such that Error = 

Indeed, one observes the difference is proportional 
to e, as expected from Corollary II.5 and (11). 

Next, the CLR and CELR methods from Sec¬ 
tion III A are tested in performing sensitivity anal¬ 
ysis in a TTS system. The reaction network de¬ 
scribed in Section IIB is simulated using Algorithm 
1. To assess convergence of the microscale distribu¬ 
tion, the batch-means stopping criterion described 
in Section IV is used with a tolerance of i5 = 0.05. 
1000 replicate trajectories are run to a time hori¬ 
zon oi tf = 100s. The initial conditions used are 



0 0.025 0.05 0.075 0.1 


e 

FIG. 3: Normalized error of the two-time-scale 
(TTS) (accelerated, approximate) simulation from 
the exact single-time-scale (STS) simulation. The 
plot (a) shows errors in species averages while the 
plot (b) shows errors in sensitivities of each species 
with respect to ks. Points indicate simulation 
statistics while dashed lines confirm the linear 
trend. The error bars are 95% confidence intervals 
of statistical noise. 


(Aq, Bo,*o) = (30,60,10) and the parameters used 
are (of, al,/3i,,02,/^a) = (1/e, 1.5/e,2,1,0.4). 

Species populations, time-averaged species popu¬ 
lations, and trajectory derivatives are recorded for 
each run over time. Using these recorded statistics, 
the sensitivities for all 15 (3 species and 5 parame¬ 
ters) species/parameter combinations are computed 
at each time point. Figure 4 shows the time evolu¬ 
tion of the normalized errors in sensitivity estimates 
of the species B over time. Estimated values from 
simulation are referenced to the analytical answer 
as computed from a differential algebraic equation 
(see Appendix C) and normalized by that amount 


so that Error = 


estimated—analytical 
I analytical I 


As expected, CLR estimates are noisy, with vari¬ 
ance that grows linearly with time. At short times, 
the variance is small enough to obtain reasonable 
estimates. As time increases, the noise becomes sig¬ 
nificant with respect to the actual values (the magni¬ 
tude of the normalized error becomes comparable to 
1). In contrast, the ergodic likelihood ratio (CELR) 
fails at short times with a noticeable bias. However, 
the bias, which exists due to a relaxation period, 
decays as 0 {l/t) when time increases and the sys¬ 
tem approaches its steady-state. The variance of 
the CELR estimates remain constant because the 
variance of trajectory derivatives increases linearly 
in time while the variance of ergodic species aver- 
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FIG. 4: Time evolution of normalized errors 
obtained for sensitivities computed by CLR and 
CELR estimators. Simulation estimates are 
referenced and normalized by the analytical 
solution at each time point to compute the error. 

The graphs (a), (b), and (c) show estimates of 
different sensitivity indices for the species B. One 
observes that the variance of the CLR estimates 
increases with time, making for inefficient 
estimation. In contrast, the CELR estimates (the 
solid red line) converge quickly to the steady state 
sensitivities with variance which is roughly 
constant with time, allowing for efficient 
steady-state sensitivity estimation. 


ages is proportional to 1/t. At long times (where 
the CLR is too noisy for efficient estimation), the 
ergodic likelihood ratio obtains accurate estimates 
with very low variance. Therefore, it is advisable to 
use the CLR method for short times (in the tran¬ 
sient regime) and the CELR method for long times 
to obtain steady state values. 

Table I shows the error and statistical noise of the 
CLR and CELR estimations of sensitivities of the 
species B at a short {t = 1.3s) and long {t = 100s) 
times. Statistical noise is obtained from bootstrap¬ 
ping the samples used to compute the sensitivity 
estimates. At t = 1.3, the CLR method has low 
error (theoretically, there is no bias) as well as low 
variance. The CELR estimator has a similarly low 
variance, but high error (due to the 0 {l/t) bias). 
At t = 100s, the CLR estimates have much higher 
variance which induces large empirical error. In con¬ 
trast, the bias of the CELR estimate decreases in 


time while the variance remains low, providing very 
small empirical errors at large times. 

TABLE I: Comparison of the error and statistical 
noise for the CLR and CELR estimators at a short 
time (during the transient regime) and at a long 
time (corresponding to steady state). Values in the 
table refer to the sensitivity of the species B with 
respect to the parameter given by the row label. 
Values are reported as a percent of the analytically 
obtained sensitivity value. 


Percent Error 


CLR CELR 
t = 1.3s 

CLR CELR 
t = 100s 

ai 

-1.4 

-40.0 

-83.5 

0.0 

(y.2 

-3.0 

-40.3 

-63.0 

-1.1 

pi 

0.2 

-39.1 

-64.1 

1.2 

P2 

-1.1 

-16.7 

-22.6 

-2.8 

P3 

-1.4 

-23.3 

20.6 

-0.1 


Half-length of 95% 


Confidence Interval 


ai 

12 

8 

99 

11 

(y.2 

12 

7 

94 

11 

pi 

12 

7 

91 

11 

P2 

13 

10 

108 

12 

P3 

21 

14 

171 

18 


VI. CONCLUSIONS 

This work develops a Two-Time-Scale (TTS) 
framework for multiscale reaction networks. By de¬ 
composing the system into “fast-classes”, one can 
approximate the behavior of the multiscale sys¬ 
tem by a lower-dimensional, single-scale, “macro- 
averaged” reaction network. By applying a singular 
perturbation expansion of the underlying probability 
measures, we have established rigorous bounds on 
the bias induced by the approximate macro-averaged 
model. We then proposed a TTS algorithm for simu¬ 
lating the macro reaction network, using an adaptive 
batch-means stopping rule for determining when the 
micro-scale dynamics have sufficiently equilibriated. 

In addition, we have shown that the sensitivities 
of the macro-averaged system provide accurate ap¬ 
proximations for the multiscale system. Since the 
macro-averaged system is single-scale, it is possi¬ 
ble to incorporate most existing sensitivity estima¬ 
tion methods to the TTS algorithm to obtain esti¬ 
mates of the system sensitivities. We proposed an 
Ergodic Likelihood Ratio estimator for steady-state 
sensitivity analysis, and demonstrated how it can be 
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adapted to the Two-Time-Scale algorithm. A sim¬ 
ulation was then used to confirm the analytic error 
bounds and demonstrate the efficiency of the TTS 
Ergodic Likelihood Ratio estimator. 
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Appendix A: Analytic Stationary Sensitivities 

For ergodic systems whose state space is relatively 
small (such that the generator Q{6) can be explic¬ 
itly constructed), one can compute the steady-state 
probability vector 7r(0) by solving the linear system 
t: ■ Q = 0 and tt • 1 = 1. Here, we show that one can 
express the sensitivities of the steady-state measure 
^ 7 r( 0 ) explicitly as a linear transformation of the 
nominal measure 7 r( 0 ). The representation we shall 
construct is an adaptation of the discrete-time tech¬ 
nique in Ref. 44 to continuous-time Markov chains, 
and exploits the algebraic properties of the pseudo¬ 
inverse Q'^ of Q. 

By differentiating tt{6 ) ■ Q{9) = 0, we have the 
relation 




Then by expanding with the Moore-Penrose pseudo¬ 
inverse, we have 




-dQ 

89 




for some vector w. 

Now, we see by the projection property of 
= (QQ^y = QQ'^ that the operator 
I — QQ~^ is the projection operator onto the ker¬ 
nel of Q' = span of tt', so that (/ — QQ^)w = 'yir' 
for some scalar 7 . Thus we have the relation 


diT 


= TT 



-I- 77r 


Now, we can see that 7 r( 0 )-l = 1, so that ^-1 = 0. 
Thus, we have 

so that 

7 = 77rl = TT—g^l. 

Putting the above together, we can write ^ as 

For reaction networks with relatively small state 
space A4, (Al) can provide a tractable method of 
computing the sensitivities of the steady-state mea¬ 
sure and thus of the steady-state expected value 


^ a/(x;0) , 


x^M. 


x^M. 


(A2) 


For example, for reaction networks with mass-action 
propensities, the entries of Q shall always be linear 
in the parameters 9. Therefore, the derivatives ^ 
are easy to compute (whereas the stationary dis¬ 
tribution 7r(0) can be quite complex as a function 
of 9). Thus by computing the pseudo-inverse Q'^ 
from Q at the nominal value of 6 (e.g., via singu¬ 
lar value decomposition), one can analytically com¬ 
pute the system sensitivities without need of Monte 
Carlo simulation. For larger reaction networks, it 
may be possible to combine the Finite State Projec¬ 
tion method^®d 6 -^jth this pseudo-inverse technique 
to estimate the exact sensitivities by computing the 
analytic sensitivities of the reduced system. 

Lastly, we use this representation to show that 
(21) holds. Write the exact generator Q® as Q® = 
g(r) = (l/e)g(a) + Q(/3) = g(a") + Q(/3). For 
a fast reaction parameter a? = ai/e a chain rule 
gives 9a^A(x;a®) = [9afA(x;a^)] /e, from which it 


follows (using (3)) that e dai Q(ot^) 


daiQ(a) = 


dafQ(a^). Putting this relation into (Al), we then 
have 




for some 7 G M. It remains to determine 7 to have 
a method of relating the sensitivity coefficient to a 
linear transformation of tt. 
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(A3) 













Similarly we see that dui f{x, a^) = 

[9a? f{x, a^)] /e. Finally, putting these rela¬ 
tions into (A3) we obtain (21), 


—E,. {/(A; /3)} = e—E^e {/(A; /3)} . 


Appendix B: Proofs of Results 

In this section we outline the proofs on the error 
bounds of the averaged reaction network and con¬ 
vergence of the sensitivities. The error bounds in 
this work are largely direct applications of the re¬ 
sults in Ref. 20 to the Two-Time-Scale reaction net¬ 
works formulated here. We present an overview of 
the proofs for insight and completeness. Similarly, 
the sensitivity convergence result comes from Ref. 
14; we shall only how to ht their result to the Two- 
Time-Scale framework. 


1. Proof of Theorem 11.4: 


Using the formulation of the exact generator 
Q‘^{9) = {l/e)Q{a.) Q{f3), the error bound on the 

induced probability measures of the exact and aver¬ 
aged systems is a direct application of Theorem 4.29 
of Ref. 20. We outline the main steps below. 

Write p^(t) € for the probability measure 

of the exact system at time t. From the Kolmogorov 
forward equation (a.k.a. Chemical Master Equa¬ 
tion), we have 


dpHt) 

dt 


fit) 


Q{ol)+Q{I3) 


(Bl) 


Dehne the differential operator on functions with 
values in by — f{Q -|- eQ). Then 

= 0 if and only if / solves the CME (Bl). The 
form of the differential equation (Bl) suggests the 
plausibility of a singular perturbation expansion of 
Pft) by 


fit) = ^ e>z(t) + ^ eVi 
2—0 2 — 0 

Assuming for the moment that such a representation 
holds, we proceed to derive the form of the “regular” 
terms <(>i(t) and the “boundary layer” terms ^i(t). 
Applying to (B2) and equating terms of e leads 



to the recursive equations 


: <t>oit)Q = 0 

: 4>iit)Q= 4>oit)Q 

e* 

: Ut)Q = - f-iit)Q 

and similarly, using the “stretched-time” 
T = tje one has equations for 0(t) 

£0 

■■ ‘‘f'’ = Mr)Q 

: =0i(t)Q ipoifQ 

e* 

■ =fifQ i’i-iifQ. 


(B3) 


(B4) 


At f = 0, 


Y,e^{ct>m + fm=fif. (B5) 


i=0 


so (/>o(0) + ’ipoiO) = P®(0) and <^^(0) -I- ipiiO) = 0 for 
all i > 1. Since p^{t) is a probability measure with 
p®(t) • 1 = 1, by sending e —>■ 0 in (B2) it follows that 


■ 1 = 1 and (j)i{t) ■ 1 = 0 


(B6) 


for all t € [0, T\ and alH > 1. 

Turning to the leading regular term (jift), we 
note that (j)o(t) ■ Q = 0 is not uniquely solv¬ 
able because Q = diag[Q9)^ _ _ Ql-'Vclj 
m — Nc However, writing (t) for the sub-vector 
of (poit) corresponding to fast-class Mk, then we 
must have = 0 for all fe = 1,..., Ac. 

Since each is an irreducible generator, we then 
have for some scalar multi¬ 
plier It can be seen (Prop 4.24^°) that 

^(fc)(i) 

= P), (t), where p{t) is the probability measure 
among fast-classes A4i,..., A4 nc induced by gener¬ 
ator Q (as in (5)) and initial distribution p®(0) • 1 = 
P{A®(0) G A4fc}. This in turn determines a unique 
solution for and therefore poit). It follows 

that (pft) is exactly the measure p^ in Theorem II.4 
induced by the TTS simulation procedure. 

With (pft) determined, (B5) then gives the initial 
condition ' 00 ( 0 ) = p'^(O) - 0o(O) = p'^(0)[/m - 1^], 
from which one can solve (B4) to obtain '0o(t) = 
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'i/'o(0) ■ exp|(5r|. It can be shown (Prop. 4.25^°) 
that 

||^/’o('r)|| < Cexpl-Kr} , (B7) 

where C depends on the Jordan-Form of Q. Higher 
order terms can also be solved for recursively (Prop. 
4.26^°), and it can be shown (Prop. 4.28^°) 

n n 

sup II ^ £'4>i{t) + ^ e'il)i{t/£) - p^{t)\\ = 0(e:”+^) 

0<‘<r i=o i=0 

(B8) 


apply the Chapman-Kolmogorov equations to see 

P{X^(t„)=y„,...,Z^(ti) = yi} 

X P{x^(t2) = 

xF\x^{h) = xf^^Y 

Then applying the error bound (B9) to each transi¬ 
tion term to obtain 


In particular, using the Oth-order expansion we have 

K-p^ll = ll</>o(T)-p^(r)|| 

< wmt) + Mt/£)-p%t)\\ + \\Mt/£)\\ 

< ||0(e) + 0(exp-Kr/e)|| (B9) 

□ 



Jvi) 

■^31 




Vi-i 


} 


as e —>■ 0 , and thus the finite dimensional distribu¬ 
tions converge. 

□ 


2. Proof of Corollary 11.5 

With the singular perturbation bound (B 8 ), 
Corollary II.5 follows immediately. Since the exact 
process X^{t) is ergodic, there exists a time horizon 
T® such that ||p^(t) — 7 r®|| < £ for t > T^. Simi¬ 
larly, by (B7) there exists T such that ||'!/’o(i)|| < e 
for t > T, and ||p(t) — 7 f|| < e for t > T, implying 
that \\(j)o{t) — TfS^II = \\p{t)n — 7 f 7 r|| < e. Then taking 
T > max{r®,r, T} and applying (B 8 ) , we have 


4. Proof of Proposition lll.l 

Proposition III.l is simply the application of The¬ 
orem 3.2 of Ref. 14 to the TTS framework. The 
method and framework for separating time-scales in 
Ref. 14 is slightly different than the TTS frame¬ 
work used here, but it can be seen that the two are 
equivalent. Here, we briefly review the multiscale 
framework of Ref. 14 and show how one can trans¬ 
late between their “remainder spaces” and the TTS 
“fast-classes”. 


\\p^t) - TfrFII < \\p%t) - (j){t) - tlj{t/£)\\ 

+ WHt) - ^11 + IIV'(i/e)ll <0{£ + exp {-Kt/e}) 
IItt^^-wS'II < \\tt^- p^T)\\ + \\p%T)-W^ < 0{e) 

and the corollary follows. □ 


3. Proof of Proposition 11.3 

This is a direct application of Theorem 5.27^°, and 
also follows from the error bound (B 8 ). First, one 
uses (B9) to establish that 


lim lim E 

t —^0 £—^0 


X^{s + t) - X'^(s)|X^(s) = X 


(fc) 

3 


= 0 


and thus {^^(■)}£>o tight. Then one shows that 
the finite-dimensional distributions converge by tak¬ 
ing arbitrary time points 0 < H < • • • < < T and 


a. Scaling Rates and Remainder Spaces 

As in Refs. 13, 18, Ref. 14 considers reaction rates 
of the form a^{x,9) = Xk{x,6) which scale 

with the “system size” or “normalization parame¬ 
ter” N ^ 0, where pk is the scaling rate for the fc-th 
reaction channel. For a given normalizing parameter 
N, the corresponding system is denoted by X^{t). 
One analyzes the system against a reference time 
scale 7 by Xl^ (t) = X^(tN'^). For a given normal¬ 
izing parameter iVo ^ 0 , the corresponding system is 
denoted by X^° (t). One analyzes the system against 
a reference time scale 7 by X^ (t) = X^ltN'*). 

The scaling rates pk determine the time scales 
at which the reaction channels fire. For a system 
with with a single level of stiffness, there are only 
two scaling rates, Pfast > Psiow, which partition 
the reaction channels as either fast or slow. Write 
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TI = {k pk = pfast} for the fast reaction index set, 
and similarly r 2 for the slow reaction index set. 

Take S 2 = {ri £ : {v,Ck) = 0 for all k £ Ti} 

so that {Xl^{t),v) is unchanged by fast reactions. 
Then take L 2 = span(§ 2 ) and 112 be the projection 
map from to L 2 , so that n 2 Cfc = 0 for all £ Ti. 

Let Li = span{(/ — Ii 2 )x : x £ A^}, and for any 
V £ 112^1 let ]HI„ = {y £ Li : y = (I — 112 ) 0 ;, 1120; = 
v,x G A4}, the set of remainders of elements in Ai 
which get projected to v. Then we can define an 
operator C’' by 


^ Afe(o; + 2 , e)[f{z + Ck) - f{z)] 

feGFi 


which is a generator of a Markov chain with state 
space ]HI„ (note that y £ ]HI„ => y + Cfe £ for all 
k £ Ti). 

Assuming ]HI„ under C’' is ergodic, there is a sta¬ 
tionary distribution tt’' . Then for each slow reaction 
k G T 2 one can define the “averaged” propensities 
Xk{v,9) = Xk{v + z,9)Tr'"{dz) for all v £ TI 2 M . 
Using the random time change representation, define 
the Markov chain on Ii 2 Ai by 


Xg{t) — II2XQ + Tfc 

fcers 


Xk{X{s),9)ds ) n 2 Cfc 
(BIO) 


Taking 72 = —psiow as the slow time scale, one has 
Ii2X^^ 0 => Xg as A —>■ 00 ^^ under more general 
conditions than Assumptions 11.1,11.2. Under this 
context, Theorem 3.2 of Ref. 14 states that 

(*))} = |E{/.(.t.«))} 

(Bll) 


where fg{v) = f{v + y) 7 r^(dy). 


b. Equivalence of Fast-Classes and Remainder Spaces 

Here we show how the TTS framework is equiv¬ 
alent to the scaling rate framework. Consider a 
TTS reaction network as described by (3). Taking 
N = l/£, Pfast = 1, Psiow = 0, it is easy to see that 

X^^g{t)=X^{t)^X^t). 

so lim^v-s-oo A^g(t) = lim£_j.o A'®(t). It remains to 

identify fg (^Xg{t)^ from (BIO), (Bll) with / (X(t)) 

from ( 8 ). We do so by showing the equivalence of 
the fast-classes A4i and the remainder spaces Hi,. 


Lemma B.l. The projection map II 2 is invariant 
on fast-classes Aii- The set of remainder spaces 
{H„ : V G n 2 Al} is in one-to-one correspondence 
with the set of fast-classes {Mi}. Additionally, each 
X G Ml corresponds to a unique element y G H„ for 
some V £ n2A4. 

Proof. Define p : {Mi} {H„ : v G n 2 Al} by 

p(Mi) =Hn 2 (x) for any a; £ TW;. 

Then p is well-defined, since x,y G Mi implies that 
y = a;-l-X]fceri CfcCfe for some Ck G N, and n 2 (Cfe) = 0 
for all /c £ Ti gives n 2 (y) = n 2 (a;). Clearly, p is also 
onto. 

It remains to establish p is injective. It is sufficient 
to show that if n2(a;) = v and n2(a;') = v for x, x' G 
M, then x and x' belong to the same fast-class Mi. 
Since n2 projects onto the span of the complement 
of spanjCfc : fe £ Ti}, we have v - x = J^k^Ti OfcCfc 
and V — x' = OfcCfe for some Cfc, c} G R. Then 

x' = X -\- - Ck)Ck, with x',x,(k G so 

it follows that {ck — c}) G N for all k. Hence x and 
x' communicate by fast reactions and thus belong to 
the same fast-class Mi. Therefore, n2 is invariant 
on fast-classes and p is injective. 

Finally, since n 2 (-) is invariant on fast-classes Mi, 
it follows that x —>■ a; — n 2 (a;) bijectively maps el¬ 
ements of UiMi to elements of U„gn 2 (A<)II« such 
that X, x' G Ml implies (x — n 2 (a;)), {x' — Il 2 (a;')) £ 
®In2(7Wi)- n 

Because of the direct correspondence between 
{n2 {x) : X G M} and {Mi}, we see (upon reorder¬ 
ing states) that for v = \i 2 {Mi), we have = 
, so that Xk{v,9) = Xk{Mi,9), and fg{v) = 
f{Mi,9). Thus, fg (^X{t)'j has the same distri¬ 
bution as f {Mi,9) and so ^fg (^Xg(t)^'^ = 
^E{/ (Afg(t))}. Therefore, using (Bll) we have 

hm dgE{f iX%t))} = hm a,E {/ (X^^git))} 

e^O N^oo ’ 

= dgE {fg {Xg{t)) } = S^E { / (X{t)) } 

(B12) 


and hence Proposition 111.1. 


Appendix C: Analytic solution of the Model System 

In a well-mixed system with linear propensities, 
the time-evolution of the system can be obtained 
from a set of ordinary differential equations (ODE). 
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(b) Reactions for the 
benchmark network. 


FIG. 5: Description of model chemical reaction 
network. 


The single time-scale (STS) system can be mod¬ 
eled with a system of ODEs. The two time-scale 
(TTS) system imposes algebraic constraints for the 
fast modes, resulting in an algebraic differential sys¬ 
tem of equations. In both cases, a set of adjoint 
ODEs can be used to compute sensitivities along¬ 
side species populations. 

In our model system, gaseous species A adsorbs 
onto a catalyst surface, isomerizes to species B, and 
then desorbs. A diagram of the reaction network 
is shown in Eigure 5a. The reactions along with 
their rate laws are shown in Table 5b. Na, Nb, and 
denote the surface coverages of species A, B, and 
empty sites respectively. The adsorption/desorption 
of species A is assumed to be much faster than the 
others. The separation of time scales is captured 
with the dimensionless parameter £ << 1. The sys¬ 
tem contains M = 3 species and R = 6 reactions. 
Mathematically, we use the M x 1 column vector N 
to specify the species populations, where Ni = Na, 
N 2 = Nb, and N 3 = iV*. 

The linear dependence of the reaction rates is writ¬ 
ten as 


r{N) 


‘ £ ^aiN^ ■ 
£“^02-^1 
PiNi 
132^2 

. I33N2 _ 


(Cl) 


The transformation matrix 


T = 


1 0 0 
0 1 0 
1 1 1 


(C3) 


yields y = T ■ N and T can be decomposed into 
Tf = [1 0 0 ] and Tg = q q ^ slow 

modes by looking at the 0 rows of S'j:. This gives us 
the transformed variables as j// = [ A^i ] and ys = 


N 2 

NI + N2 -\- N3 

In the context of our example problem, we can 
assign physical meaning to the transformation: The 
variable j/i = Na is affected by both slow and fast 
reactions. Eor a given set of slow variables, we can 
solve for yi to specify the equilibrium constraint of 
ri = r 2 . The variable ?/2 = Nb is unaffected by the 
fast adsorption/desorption of A, but is affected by 
the slow reactions. Einally, the variable 2/3 = Na + 
Nb+N^, is a second ’’slow” variable. In this example 
2/3 = 1 applies at all times due to stoichiometric 
constraints. However, it is still identified as a ’’slow 
mode” because this constraint is not a consequence 
of disparities in reaction time scales. 

The system is simulated with the choice of param¬ 


eters A^o = 


30 

60 

10 


, ai = 1, 02 = 1-5, /3i = 2,132 = 1, 


133 = 0.4, £ = 0.01. The simulation results are shown 
in Eigure 6 . Table II shows values at t = 1.3s and 
t = 100s along with CLR and CELR estimates with 
statistical confidence intervals. Derivatives with re¬ 
spect to (32 and (33 overlap because both affect sys¬ 
tem properties through the the independent param¬ 
eter (32 + ( 33 - 

In general, the system parameters need not be the 
rate constants themselves. A different parameter¬ 
ization would involve a transformation of the rate 
constants. Sensitivities could be obtained through a 
chain rule. 


Each row of the stoichiometric matrix corresponds 
to a different species, which are Ni, N 2 , and N 3 re¬ 
spectively. The columns correspond to each of reac¬ 
tions 1-5 in order. Extracting the information from 
Table 5b and putting it in mathematical form gives 
the M X R stoichiometric matrix 


5 = 


1 - 1-11 0 
0 0 1-11 
-11 0 0-1 


(C2) 
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FIG. 6: Network results for the two time-scale 
system. Graphs show population counts (top left), 
derivatives of species A (top right), derivatives of 
species B (bottom left), and derivatives of empty 
sites (bottom right). 


TABLE II: Comparison of the CLR and CELR 
estimators with the ODE solution. Values show 
actual values rather than errors. Values in the table 
refer to the sensitivity of the species B with respect 
to the parameter given by the row label. 95% 
confidence intervals are based on statistical noise. 



ODE 

t = 1.3s 
CLR 

CELR 

ai 

11.9 

11.8 ± 1.5 

7.2 ±0.9 

a2 

-7.9 

-7.7 ± 1.0 

-4.7 ±0.6 

pi 

9.9 

10.0 ± 1.2 

6.0 ±0.8 

P2 

-17.4 

-17.2 ±2.4 

-14.5 ± 1.7 

P3 

-17.4 

-17.1 ±3.4 

-13.3 ±2.4 



t = 100s 
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CLR 

CELR 
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2.3 ± 13.7 

13.9 ± 1.4 
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-9.3 
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-9.2 ± 1.0 

Pi 
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4.1 ± 10.6 

11.7± 1.2 

P 2 

-16.5 

-12.8 ±18.1 

-16.1 ± 2.0 

Ps 

-16.5 

-19.9 ±29.0 

-16.5 ±3.2 
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